# Omniclustering Round 2
This is a document to coordinate the second round of Omniclustering (round 1 [here](https://renkulab.io/gitlab/omnibenchmark/omni_clustering)). Two versions will be created;
A) a **'pinned' version** that will use the same software versions as the initial publication
B) an **'updated' version** that will use updated source code and package version.
**Rationale**: show the capability of Omnibenchmark to reproduce a benchmark ('pinned' version) and provide another version, open for collaboration and updates, for up-to-date clustering method recommendations ('updated' version).
## Parameters choice
### Former version:
| Parameter| OmniClustering |Duo 2018 |
| -------- | -------- | -------- |
| K (desired number of clusters) | Within a [2-15] range, +/-3 to true n clust | [2-10] small datasets, [2-15] bigger datasets (1) |
|n seeds (= n runs?) | 3 | 5 |
| resolution (Seurat) | ~~[0.9](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/r-seurat/-/blob/master/src/r-seurat.R#L43), fixed for all datasets~~ implement a dataset-specific range such as in original paper [[0.3-1.5]](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#LL134C1-L134C1) | [[0.3-1.5]](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#LL134C1-L134C1) and dataset specific|
| xdim, ydim (FlowSOM) | default (???) | [5 to 15](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#L282-L290), depending on dataset's size (n.cells)
(1) SimKumar8hard, Zhengmix8eq, KohTCC, Koh
### Changes for new version
| Parameter | Value | Where |
| -------- | -------- | ------|
| K | Keep the +/-3 to true n clust, as a first step (combinations created by `filter_combinations.py`). Latter, increase to 2:10 and 2:15 | parameter project |
| seed | 1 seed, as a first step | parameter project |
| xdim, ydim | (N.cells < 300) xdim =ydim = 5 , (600 > N.cells > 300) xdim = ydim = 8 , else: xdim = ydim = 15 | [FlowSOM](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/clustering/apply_FlowSOM.R#L17-L18) |
| resolution | [0.9](https://gitlab.renkulab.io/omni_hackathon/omni_clustering/r-seurat/-/blob/master/src/r-seurat.R#L43) as a first step. [Conditional ranges](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#L134) latter (also need to find the rules that defined the range used in Duo). | Seurat |
Once the first workflows have been run and depending on computing time, increase **K**, **seed**,
### Filtering
[Example project](https://gitlab.renkulab.io/omb_benchmarks/omniclustering_pinned/method_PCAHC_pinned)
1. Add [filtering functions](https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/method_PCAHC_pinned/-/blob/main/src/filter_combinations.py). (if necessary, include **pandas** to `requirements.txt`)
4. Modify [`run_workflow.py`](https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/method_PCAHC_pinned/-/blob/master/src/run_workflow.py) to:
* import the filtering function (`import filter_combinations` to test `import src/filter_combinations as filter_combinations`)
* generate a json file with all combinations to filter:
```python
# generate json with filter combinations
filter_comb = filter_combinations.get_param_filter_by_ground_truth(omni_obj)
with open("src/filter_comb.json", "w") as fp:
json.dump(filter_comb, fp, indent=3)
renku_save()
```
* update `omni_obj`:
```python
# update outputs and commands
omni_obj.outputs.filter_json = "src/filter_comb.json"
omni_obj.outputs.update_outputs()
omni_obj.command.outputs = omni_obj.outputs
omni_obj.command.update_command()
```
## Assignments
Assign yourself to one of the following projects (if possible, starting with the pinned version).
**Benchmark name**: `omniclustering_pinned`
### A) Pinned version
Datasets: **keyword**: `dataset_omniclustering_pinned`
Naming: `[NAME]_pinned` e.g. `Koh_filterExpr_pinned`, `KohTCC_filterHVG_pinned`
:::info
#### Fixes:
1. Direct the [ExperimentHub cache](https://support.bioconductor.org/p/106349/) towards a directory where you have read and write permissions at the top of the R script:
```
options(EXPERIMENT_HUB_CACHE="/home/rstudio/.cache")
```
2. Explicitly load the `SingleCellExperiment` library
3. The filtered objects do not contain sizeFactors. In the R script fix the meta data file content to something like:
```
meta <- data.frame("pheno_id" = sce$phenoid,
"size_factor" = sce$total_counts,
"cell_id" = colnames(sce))
```
4. Include logcounts to output files by modifying `src/config.yaml` and `src/run_workflow.py` (see [example](https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/koh-filterexpr-pinned))
5. (Fixed with omniValidator 0.0.19): ~~Disable checks by omniValidator~~
6. Generate dataset via __renku cli__, if you get a warning about an existing matching dataset name
```bash=
renku dataset create -k DATA_KEYWORD DATA_NAME
```
:::
Each dataset to be done on **both filtering**.
- [x] [**Koh**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/koh) --> Almut
- [x] [**KohTCC**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/ttc-koh) --> Anthony
- [x] [**Kumar**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/kumar) --> Anthony
- [x] [**KumarTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ttc-kumar)
- [x] [**SimKumar4easy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-easy) --> Mark [DONE]
- [x] [**SimKumar4hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-hard) --> Anthony
- [x] [**SimKumar8hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-8-hard) --> Anthony
- [x] [**Trapnell**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/trapnell) --> Anthony
- [x] [**TrapnellTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tcc-trapnell) --> Anthony
- [x] [**ZhengMix4eq**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) --> Anthony
- [x] **ZhengMix4uneq** (project dropped, copy the content of ZhengMix4eq and change the [main function](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) to `sce_full_Zhengmix4uneq()` ) --> Anthony
- [x] [**ZhengMix8eq**](https://renkulab.io/gitlab/omnibenchmark/omni_data/zhengmix8eq) --> Anthony
~~Preprocessing: **keyword**: `preprocessing_omniclustering_pinned`~~
- [ ] ~~[**filterExpression**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/filter-expression)(TBP: scran_1.6.9, scater_1.8.0)~~
- [ ] ~~[**filterM3Drop**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/filter-m3drop) (TBP: scater_1.8.0, M3Drop_1.4.0)~~
Methods: **keyword**: `method_omniclustering_pinned`
**Ignore the fix section from the run_workflow.py**
- [x] [**ascend**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ascend-clustering) Note: also copy the `install_r.sh` file to the new project. --> Anthony
- [x] [**CIDR**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/cidr-clustering) --> Anthony
- [x] [**FlowSOM**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/flowsom-clustering) (TBP -> v1.12.0) see also the [parameter](https://hackmd.io/-ZHBAw06SK-oFAxb5tZz6g#New-version) section --> Anthony
- [x] [**Monocle**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/monocle-clustering) (TBP -> v2.26.0) --> Anthony
- [x] [**PCAHC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/pca-hc) --> Almut
- [x] [**PCAKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcakmeans-clustering) --> anthony
- [x] [**PCAReduce**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcareduce-clustering) --> anthony
- [x] [**RaceID2**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/raceid2-clustering) (TBP -> March 3 2017) --> Anthony
- [x] [**RtsneKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/rtsne-kmeans) (TBP -> v.0.13) --> Anthony
- [x] [**SAFE**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/safe-clustering) --> Anthony
- [x] [**SC3**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/clustering-sc3) (TBP -> v.1.8.0) --> Anthony
- [x] [**SC3svm**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/sc3-svm-clustering) (TBP -> v.1.8.0) --> Anthony
- [ ] [**Seurat**](https://gitlab.renkulab.io/omni_hackathon/omni_clustering/r-seurat) (TBP -> 2.3.1) see also the [parameter](https://hackmd.io/-ZHBAw06SK-oFAxb5tZz6g#New-version) section --> Mark
- [x] [**TSCAN**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tscan-clustering) (TBP -> 1.18.0) --> Anthony
Parameters: **keyword**: `params_omniclustering_pinned`
- [x] [**clustering parameters**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/omni-clustering-param) --> Almut
Metrics: **keyword**: `metrics_omniclustering_pinned`
- [x] [**ARI at true # clusters**]([mclust](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/ari-at-true-n-clust)) --> Anthony
- [ ] [**Adjusted Rand index**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/adjusted-rand-index) --> Anthony
- [x] [**Shannon Entropy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/shannon-entropy) --> Anthony
(TBP) To Be Pinned; these projects have updated versions but need to be downgraded and pinned. See the original [wrappers](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/tree/master/Rscripts/clustering) for the code.
<details>
<summary> UPDATED VERSION (for latter) </summary>
### B) Updated version
*Namespace and projects to come soon...*
List of projects and their former links:
Datasets:
- [ ] [**Koh**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/koh)
- [ ] [**KohTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ttc-koh)
- [ ] [**Kumar**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/kumar)
- [ ] [**KumarTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ttc-kumar)
- [ ] [**SimKumar4easy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-easy)
- [ ] [**SimKumar4hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-hard)
- [ ] [**SimKumar8hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-8-hard)
- [ ] [**Trapnell**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/trapnell)
- [ ] [**TrapnellTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tcc-trapnell)
- [ ] [**ZhengMix4eq**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq)
- [ ] **ZhengMix4uneq** (project dropped, copy the content of ZhengMix4eq and change the [main function](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) to `sce_full_Zhengmix4uneq()` )
- [ ] [**ZhengMix8eq**](https://renkulab.io/gitlab/omnibenchmark/omni_data/zhengmix8eq)
Preprocessing:
- [ ] [**filterExpression**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/filter-expression)
- [ ] [**filterM3Drop**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/filter-m3drop)
Methods:
- [ ] [**ascend**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ascend-clustering) (TBU)
- [ ] [**CIDR**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/cidr-clustering) (TBU)
- [ ] [**FlowSOM**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/flowsom-clustering)
- [ ] [**Monocle**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/monocle-clustering)
- [ ] [**PCAHC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/pca-hc)
- [ ] [**PCAKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcakmeans-clustering)
- [ ] [**PCAReduce**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcareduce-clustering)
- [ ] [**RaceID2**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/raceid2-clustering)
- [ ] [**RtsneKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/rtsne-kmeans)
- [ ] [**SAFE**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/safe-clustering) (TBU)
- [ ] [**SC3**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/clustering-sc3)
- [ ] [**SC3svm**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/sc3-svm-clustering)
- [ ] [**Seurat**](https://gitlab.renkulab.io/omni_hackathon/omni_clustering/r-seurat)
- [ ] [**TSCAN**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tscan-clustering)
Parameters:
- [ ] [**clustering parameters**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/omni-clustering-param)
Metrics:
- [ ] [**ARI at true # clusters**]([mclust](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ari-at-true-n-clust))
- [ ] [**Adjusted Rand index**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/adjusted-rand-index)
- [ ] [**Shannon Entropy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/shannon-entropy)
(TBU): To Be Updated. The former renku projects have pinned version and need to be updated to lattest versions.
</details>
## Procedure
- [x] ~~Create a new project from Renku website. For pinned projects, append a `_pinned` suffix to their name.~~
~~https://renkulab.io/projects/new?data=eyJuYW1lc3BhY2UiOiJvbWJfYmVuY2htYXJrcy9vbW5pY2x1c3RlcmluZ19waW5uZWQiLCJ2aXNpYmlsaXR5IjoicHVibGljIiwidXJsIjoiaHR0cHM6Ly9naXRodWIuY29tL29tbmliZW5jaG1hcmsvY29udHJpYnV0ZWQtcHJvamVjdC10ZW1wbGF0ZXMiLCJyZWYiOiJDTElfcmVua3V2Ml9SM181In0%3D~~
- [ ] Pick the project from the [terraformed omnibenchmark](https://gitlab.renkulab.io/omb_benchmarks/omniclustering_pinned)
- [ ] disable "*Enable shared runners for this project*": Settings > CI/CD > Runners
- [ ] Copy the following files from the original project:
- `install.R`
- files in `src/`
- [ ] If the project has to be downgraded (pinned benchmark) or upgraded (updaed benchmark):
- `install.R` (for down/upgrades): pin the relevant R package(s) to the old/ new version
- `src/[MAIN_SCRIPT].R` (for down/upgrades): test the wrapper on one dataset
- `src/config.yaml` modify the `data:`, `script:` and `benchmark_name` sections so that they match to the current project name/ title, main script and benchmark name.
- [ ] Double check that the fixed parameters correspond to the ones defined in [Duo's code](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#LL135C49-L137C1). For the updated version, choose an equivalent one.
- [ ] Make sure that the project is running (`run_workflow.py`) and double-check a few of the outputs.
:::warning
If you get a warning upon `omni_obj.create_dataset()` about the dataset name already taken. Generate the dataset using the renku cli:
```bash=
renku dataset create -k DATA_KEYWORD DATA_NAME
```
:::
- [ ] Add the project to the corresponding orchestrator's CI/CD:
- Pinned: https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/orchestrator
- Updated: https://renkulab.io/gitlab/omb_benchmarks/omniclustering/orchestrator/-/blob/master/.gitlab-ci.yml
- [ ] report any encountered issue into the official documentation, under the 'Common bugs' section: https://github.com/omnibenchmark/documentation/tree/master/docs/04_bugs