owned this note
owned this note
Published
Linked with GitHub
[](https://github.com/psf/black)
[](https://github.com/alan-turing-institute/affinity-vae/actions/workflows/tests.yml)
[](https://github.com/pre-commit/pre-commit)
# Affinity-VAE
**Affinity-VAE for disentanglement, clustering and classification of objects in multidimensional image data**
Mirecka J, Famili M, Kotanska A, Jurashcko N, Costa-Gomes B, Palmer CM, Thiyagalingam J, Burnley T, Basham M & Lowe AR
[](https://doi.org/10.48550/arXiv.2209.04517)
## Installation
### Installing with pip + virtual environments
> Note: This has been tested in the `refactor` branch.
You can install the libraries needed for this package on a [fresh virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) with the following:
```
python3 -m venv env
source env/bin/activate
pip install -e .
```
> Note: This is the preffered option for running on Turing macOS laptops.
> Warning: M1 macOS can not do [pytorch paralelisation](https://github.com/pytorch/pytorch/issues/70344). A temporary solution for this is to modify the code on the DataLoaders in data.py to `num_workers=0` in order to run the code. Otherwise you will get the error: `AttributeError: Can't pickle local object 'ProteinDataset.__init__.<locals>.<lambda>'`.
### Installing with conda in Baskerville
The following is the recommended way of installing all libraries in Baskervile.
```
conda create --name affinity_env
conda activate affinity_env
conda install --yes python=3.10
conda install --yes numpy
conda install --yes requests
conda install -c anaconda pandas
conda install -c anaconda scikit-image
conda install -c anaconda scikit-learn
conda install -c anaconda scipy
conda install -c anaconda pillow
conda install -c conda-forge mrcfile
conda install -c conda-forge altair
conda install -c conda-forge umap-learn
conda install -c conda-forge matplotlib
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
```
if the follwoing error occurs:
```
:ImportError: libtiff.so.5: cannot open shared object file: No such file or directory
```
you can resolve it via:
```
conda install -c anaconda libtiff==4.4.0
```
### Quick start
Affinity-vae has a running script (`run.py`)that allows you to configure and run the code. You can look at the avaible configuration options by running:
```
python run.py --help
```
which will give you:
```
Usage: run.py [OPTIONS]
Options:
-d, --datapath TEXT Path to training data. [required]
-lm, --limit INTEGER Limit the number of samples loaded (default
None).
-sp, --split INTEGER Train/val split in %.
-nd, --no_val_drop Do not drop last validate batch if if it is
smaller than batch_size.
-ep, --epochs INTEGER Number of epochs (default 100).
-ba, --batch INTEGER Batch size (default 128).
-de, --depth INTEGER Depth of the convolutional layers (default 3).
-ch, --channels INTEGER First layer channels (default 64).
-ld, --latent_dims INTEGER Latent space dimension (default 10).
-pd, --pose_dims INTEGER If pose on, number of pose dimensions.
-b, --beta FLOAT Variational beta (default 1).
-g, --gamma FLOAT Scale factor for the loss component
corresponding to shape similarity (default 1).
-lr, --learning FLOAT Learning rate (default 1e-4).
-lf, --loss_fn TEXT Loss type: 'MSE' or 'BCE' (default 'MSE').
-fev, --freq_eval INTEGER Frequency at which to evaluate test set (default
every 10 epochs).
-fs, --freq_sta INTEGER Frequency at which to save state (default every
10 epochs).
-fe, --freq_emb INTEGER Frequency at which to visualise the latent space
embedding (default every 10 epochs).
-fr, --freq_rec INTEGER Frequency at which to visualise reconstructions
(default every 10 epochs).
-fi, --freq_int INTEGER Frequency at which to visualise latent
spaceinterpolations (default every 10 epochs).
-ft, --freq_dis INTEGER Frequency at which to visualise single
transversals (default every 10 epochs).
-fp, --freq_pos INTEGER Frequency at which to visualise pose (default
every 10 epochs).
-fac, --freq_acc INTEGER Frequency at which to visualise confusion
matrix.
-fa, --freq_all INTEGER Frequency at which to visualise all plots except
loss (default every 10 epochs).
-ve, --vis_emb Visualise latent space embedding.
-vr, --vis_rec Visualise reconstructions.
-vl, --vis_los Visualise loss.
-vi, --vis_int Visualise interpolations.
-vt, --vis_dis Visualise single transversals.
-vps, --vis_pos Visualise pose interpolations in the first 2
dimensions
-vac, --vis_acc Visualise confusion matrix.
-va, --vis_all Visualise all above.
-g, --gpu Use GPU for training.
-ev, --eval Evaluate test data.
-dn, --dynamic Enable collecting meta and dynamic latent space
plots.
--help Show this message and exit.
```
### Quickstart
You can run on example data with the following command:
```
python affinity-vae/run.py -d data/subtomo_files --split 20 --epochs 10 -ba 128 -lr 0.001 -de 4 -ch 64 -ld 8 -pd 3 --beta 1 --gamma 2 --limit 1000 --freq_all 5 --vis_all --dynamic
```
where the **subtomo_files** is a directory with a number of `.mcr` proteine image files named with the protein keyword such as (`1BXN_m0_156_Th0.mrc`,`5MRC_m8_1347_Th0.mrc`, etc). The **subtomo_files** directory should also have be a `classes.csv` file with a list of the protein names and keywords to be considered (`1BXN`, `5MRC`, etc.) and a `affinity_scores.csv` matrix with the initial values for the proteins named in the `classes.csv`.
------------------------------
OLD NOTES
================
# Affinity VAE test
All these tests are on MacOS
# Data SHREC
## Data
Data can be found in this [link](
https://dataverse.nl/dataset.xhtml;jsessionid=d9442e15796459083ad19442efa4?persistentId=doi%3A10.34894%2FXRTJMA&version=&q=&fileTypeGroupFacet=%22Archive%22&fileTag=&fileSortField=name&fileSortOrder=desc).
To start download file called `hrec21_full_dataset.zip` with size ~7.9 GB.
## Cut the data into proteins script:
Downloaded data is a big image with many proteins. Marjan wrote script to cut the proteins into
`python create_dataset_shrec.py`
Notes from Camila:
- Need to update paths on script.
- Ask Marjan about molecule list and affinity files
- molecule_list.csv is the same file as classes.csv refered to [here](https://github.com/alan-turing-institute/affinity-vae/blob/e2e2d9e743fdcf0a3e70ef5aae01c1cf03258de2/avae/data.py#L164). Make sure to have both files for now.
Napari to view data: https://napari.org/stable/
## Run
python3 affinity-vae/run.py -d path-todata/subtomo_files --split 20 --epochs 500 -ba 128 -lr 0.001 -de 4 -ch 64 -ld 8 -pd 3 --beta 1 --gamma 2 --limit 1000 --freq_all 10 --vis_all --dynamic
### Requirements
- classes.csv file -> from molecule list and needs to be a break line list instead of comma separated.
- affinity matrix
## Environments
You can try install with conda (this has worked in Baskerville) or a venv + pip install of the requirement files.
## venv environemnt
**This one installs without issues**
Create venv enviroment, activate
then run
pip install -r requirements.txt
this works, now try running as describled above:
Notes from Camila:
- When running currently facing error `AttributeError: Can't pickle local object 'ProteinDataset.__init__.<locals>.<lambda>'`
- This is a problem due to mac not being able to parallize with torch yet (see this [issue](https://github.com/pytorch/pytorch/issues/70344#issuecomment-1005013413)). This is solved by changing `num_workers` to 0 in the DataLoader initialisaiton.
### Conda environemnt
conda install --yes python=3.10
conda install --yes numpy
conda install --yes requests
conda install -c anaconda pandas
conda install -c anaconda scikit-image
conda install -c anaconda scikit-learn
conda install -c anaconda scipy
conda install -c anaconda pillow
conda install -c conda-forge mrcfile
conda install -c conda-forge altair
conda install -c conda-forge umap-learn
conda install -c conda-forge matplotlib
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
if this error occurs :ImportError: libtiff.so.5: cannot open shared object file: No such file or directory (edited)
Resolve :
conda install -c anaconda libtiff==4.4.0
Notes from Camila:
- currently have problems of compatibility of python 3.10 and pytorch, trying to solve them by downgrading python... haven't worked yet
## Precommit