owned this note
owned this note
Published
Linked with GitHub
# Building a “Google Maps” for molecular and cellular biology
**Reporting period: 01/04/2023 -- 30/09/2023**
## Notable/substantive activity/outcomes in research progress
### Affinity-VAE
We have developed a machine learning method named Affinity-VAE (Variational Autoencoder) which enhances the performance of $\beta$-VAE by increasing the interpretability of the learnt representations. In the case of Cryo-ET, this is achieved by incorporating our prior knowledge of protein structure into the learning problem. Our method has been been implemented in an open-source library in collaboration with the RFI and STFC. The package is publicly available on The Alan Turing Institute github organisation and under continous improvement and development.
https://github.com/alan-turing-institute/affinity-vae
### Leveraging research activities to build partnerships
+ Francis Crick Institute
+ Following the _Living Systems Symposium_ that we organised, we talked with [James Briscoe](https://www.crick.ac.uk/research/labs/james-briscoe), Associate Research Director at the Francis Crick Institute. He expressed an interest in developing a deeper partnership with the Turing.
+ We in the early stages of building a collaboration with [Silvia Santos](https://www.crick.ac.uk/research/labs/silvia-santos) to determine proteomic signatures at sub-cellular resolution and use these to predict cellular phenotypes.
+ Wellcome Sanger Institute
+ With other members Health team we have been developing a partnership with the Wellcome Sanger Institute.
+ These new collaborative projects will synergise with this project.
+ The new partnership projects will leverage investments in observing biological systems at cellular resolution to develop state-of-the-art generative machine learning tools that can model cellular behaviours across various modalities and scales.
### External engagement, influence, impact
+ Interviewed -- not for attribution -- alongside experts from Google DeepMind, Anthropic, OpenAI, UK and US governments, MIT, University of Washington and NIH for a new report _"The Convergence of AI and the Life Sciences"_. The report will be launched at the AI Fringe event on 30th October 2023 ahead of the AI safety summit at Bletchley Park.
## Status of progress against goals/milestones, any blockers to progress
### Affinity-VAE
+ We have successfully filled the gaps in validating the model for a variety of datasets, This includes:
+ Synthetic 2D alphanumeric dataset
+ [MNIST dataset](https://paperswithcode.com/dataset/mnist)
+ [SHREC21 Protein dataset](http://shrec.ge.imati.cnr.it/shrec21_protein/).
+ Denoised SHREC21 dataset
+ We have performed extensive high throughput calculation on each dataset to ensure the reliability of the results and confidence in the outcome.
+ The current goals are to summarise results and update our pre-print and prepare for submission.
+ _profet_ has been fully integrated in the RFI's _parakeet_ pipeline (https://github.com/rosalindfranklininstitute/parakeet), a cryoEM/cryoET digital twin.
+ STFC will provide real-world data and evaluate the models.
+ RFI will create a large dataset of simulated structures to scale-up training of the model.
### GRACE
+ Methodological development of multi-step image analysis pipeline to detect higher-order structures in noisy bioimaging data combining feature extraction, graph neural networks and combinatorial optimisation.
+ Our preliminary deployment involves synthetic datasets generated with simple geometrical objects as proof-of-concept demonstration.
+ We have experimental datasets from public databases and from collaborators at the University of Bristol.
+ An extended abstract detailing the application to cryo-EM will be submitted soon.
+ A longer technical paper, with broader applications to biomedical imaging is in the planning stages.
## Conference/workshop talks presented (please name the event, dates – include link/s)
+ CCP-EM Spring Symposium, Nottingham, UK, 25-27 April 2023 https://www.ccpem.ac.uk/training/spring_symposium_2023/spring_symposium_2023.php
+ NEUBIAS Symposium, Porto, Portugal (in person), 11-12 May 2023 https://eubias.org/NEUBIAS
+ CVPR CVMI Workshop, Vancouver, Canada (virtual), 19 Jun 2023 https://cvmi-workshop.github.io/spotlight.html
+ Talk presented at the Microscience Microscopy Congress 2023 (in person), 4 - 6 July 2023, Manchester, UK https://www.mmc-series.org.uk
+ Talk presented at the Napari Hack Day (virtual). 26 July 2023, London, UK.
https://happeningnext.com/event/napari-hack-day-eid4so4454ddd1
+ ICML Comp Bio workshop, Honolulu, Hawaii, U.S. (in person), 29 Jul 2023 https://icml-compbio.github.io/
+ Artificial Intelligence for Quantitative Cancer Analysis, Turku, Finland (in person), 29 Aug - 1 Sep, 2023 https://eurobioimaging.fi
+ Talks at the Learning the organisational principles of living systems symposium (in person). 14 Sep 2023, London, UK. https://www.turing.ac.uk/research/interest-groups/learning-organisational-principles-living-systems
+ Uncertainty Quantification for Generative Modelling Workshop, The Alan Turing Institute, 15 Sep 2023, https://www.eventsforce.net/turingevents/frontend/reg/thome.csp?pageID=110064&eventID=287
+ Lightning talk + poster presented at Neuroinformatics Assembly (virtual), 18-22 September 2023 https://neuroinformatics.incf.org
## Papers submitted or published (include link/s)
+ _"Affinity-VAE for disentanglement, clustering and classification of objects in multidimensional image data"_ [arXiv](https://arxiv.org/abs/2209.04517), under revision
+ _"profet: A Python package for fetching protein structures from multiple data sources"_ [JOSS](https://github.com/openjournals/joss-reviews/issues/5705), under revision.
+ _"Virtual perturbations to assess explainability of deep-learning based cell fate predictors"_ [bioRxiv](https://www.biorxiv.org/content/10.1101/2023.07.17.548859v1.full), accepted at [ICCV](https://openaccess.thecvf.com/content/ICCV2023W/BIC/html/Soelistyo_Virtual_Perturbations_to_Assess_Explainability_of_Deep-Learning_Based_Cell_Fate_ICCVW_2023_paper.html)
+ _"Learning dynamic image representations for self-supervised cell cycle annotation"_ [bioRxiv](https://www.biorxiv.org/content/10.1101/2023.05.30.542796v1.full), accepted at [ICML](https://icml-compbio.github.io/2023/papers/WCBICML2023_paper23.pdf)
+ _"Machine learning enhanced cell tracking"_ [Frontiers in Bioinformatics](https://www.frontiersin.org/articles/10.3389/fbinf.2023.1228989/full)
## Articles/blogs/press releases published (include link/s)
+ NA
## Software/code/tools/methods developed/released (include link/s)
+ The Software we have developed is a variation of $\beta$-VAE and it is publicly available on The Alan Turing Institute github organisation. It is a high quality library which includes a variatey of visualisation and validation functionality (see [Affinity-VAE](https://github.com/alan-turing-institute/affinity-vae)).
+ Software package allowing the search and download of protein structures from different sources. First pypi release (see [profet](https://github.com/alan-turing-institute/profet)).
+ New features (such as CLI download) added in preparation to first release (see [EMPIARreader](https://github.com/alan-turing-institute/empiarreader)).
+ Development of general graph neural network + combinatorial optimisation methodology (see [GRACE](https://github.com/alan-turing-institute/grace)).
+ In collaboration with the scivision team, we started a new project [pixelflow](https://github.com/alan-turing-institute/pixelflow) that can be used to apply scivision models and other ML and metrology packages to scientific image data.
## New datasets created/accessed
+ Synthetic dataset for geometric shape identification in graphical structures from (noisy) images
+ CryoEM annotated dataset from collaborators at the University of Bristol
+ 2D Alphanumeric Dataset
+ Any EMPIAR dataset can now be downloaded locally using _EMPIARreader_
+ Any protein structure can be retrieved from two databases: Protein Data Bank and Alphafold using _profet_
## External engagement, influence, impact (Academic/ industry/ government/ public/ international)
+ Organised the _Living Systems Symposium_ held in September 2023 at the British Medical Association (in person).
+ https://www.turing.ac.uk/research/interest-groups/learning-organisational-principles-living-systems
+ Brought in keynote speakers from EPFL, Switzerland and the Wellcome Sanger Institute, UK. Also hosted shorts talks from the community.
+ Hosted a panel discussion about the intersection of AI and the life sciences.
+ Event was very successful. Many people thanked us for organising an event to bring the community together.
+ Over 150 registrants and approximately 50 in-person attendees on the day.
+ Although the _Affinity-VAE_ project was developed to study proteins we have extended the domain of the application across other disciplines. This includes:
+ Collaboration with astrophysicist at Harvard University to identify objects of interest in the [Sky Map](http://www.skymaponline.net/).
+ Classification of rapeseeds to identify the stage of maturity for harvest (work in collaboration with Turing researchers and the [Scivision project](https://github.com/alan-turing-institute/scivision)).
+ ICML WiML Un-Workshop (in-person) Breakout session facilitator and social media manager. Honolulu, Hawaii, U.S. (in person), 29 Jul 2023. https://sites.google.com/wimlworkshop.org/wiml-unworkshop-2023/home
+ Data Science role models, Institute of Physics, Summer 2023. https://www.iop.org/explore-physics/mimis-rainbow-adventure/data-science-role-models
+ Keynote speaker at STEM night + invited speaker to classes, St Gilgen's International School, Austria, June 2023
## Funding: Further funding, leveraged funding/support, in-kind contributions
+ Out of budget funding via Turing Vision.
+ ICML Conference Expenses via 'Strategic Priorities Fund R-SPF1-001'
## Patents (drafted/applied for/granted)
+ NA
## Awards/recognition
+ Best Poster Award @ ICML Comp Bio Workshop 2023 | [Tweet link](https://twitter.com/KristinaUlicna/status/1685934857225220096)