Omniclustering Round 2

# Omniclustering Round 2 This is a document to coordinate the second round of Omniclustering (round 1 [here](https://renkulab.io/gitlab/omnibenchmark/omni_clustering)). Two versions will be created; A) a **'pinned' version** that will use the same software versions as the initial publication B) an **'updated' version** that will use updated source code and package version. **Rationale**: show the capability of Omnibenchmark to reproduce a benchmark ('pinned' version) and provide another version, open for collaboration and updates, for up-to-date clustering method recommendations ('updated' version). ## Parameters choice ### Former version: | Parameter| OmniClustering |Duo 2018 | | -------- | -------- | -------- | | K (desired number of clusters) | Within a [2-15] range, +/-3 to true n clust | [2-10] small datasets, [2-15] bigger datasets (1) | |n seeds (= n runs?) | 3 | 5 | | resolution (Seurat) | ~~[0.9](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/r-seurat/-/blob/master/src/r-seurat.R#L43), fixed for all datasets~~ implement a dataset-specific range such as in original paper [[0.3-1.5]](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#LL134C1-L134C1) | [[0.3-1.5]](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#LL134C1-L134C1) and dataset specific| | xdim, ydim (FlowSOM) | default (???) | [5 to 15](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#L282-L290), depending on dataset's size (n.cells) (1) SimKumar8hard, Zhengmix8eq, KohTCC, Koh ### Changes for new version | Parameter | Value | Where | | -------- | -------- | ------| | K | Keep the +/-3 to true n clust, as a first step (combinations created by `filter_combinations.py`). Latter, increase to 2:10 and 2:15 | parameter project | | seed | 1 seed, as a first step | parameter project | | xdim, ydim | (N.cells < 300) xdim =ydim = 5 , (600 > N.cells > 300) xdim = ydim = 8 , else: xdim = ydim = 15 | [FlowSOM](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/clustering/apply_FlowSOM.R#L17-L18) | | resolution | [0.9](https://gitlab.renkulab.io/omni_hackathon/omni_clustering/r-seurat/-/blob/master/src/r-seurat.R#L43) as a first step. [Conditional ranges](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#L134) latter (also need to find the rules that defined the range used in Duo). | Seurat | Once the first workflows have been run and depending on computing time, increase **K**, **seed**, ### Filtering [Example project](https://gitlab.renkulab.io/omb_benchmarks/omniclustering_pinned/method_PCAHC_pinned) 1. Add [filtering functions](https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/method_PCAHC_pinned/-/blob/main/src/filter_combinations.py). (if necessary, include **pandas** to `requirements.txt`) 4. Modify [`run_workflow.py`](https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/method_PCAHC_pinned/-/blob/master/src/run_workflow.py) to: * import the filtering function (`import filter_combinations` to test `import src/filter_combinations as filter_combinations`) * generate a json file with all combinations to filter: ```python # generate json with filter combinations filter_comb = filter_combinations.get_param_filter_by_ground_truth(omni_obj) with open("src/filter_comb.json", "w") as fp: json.dump(filter_comb, fp, indent=3) renku_save() ``` * update `omni_obj`: ```python # update outputs and commands omni_obj.outputs.filter_json = "src/filter_comb.json" omni_obj.outputs.update_outputs() omni_obj.command.outputs = omni_obj.outputs omni_obj.command.update_command() ``` ## Assignments Assign yourself to one of the following projects (if possible, starting with the pinned version). **Benchmark name**: `omniclustering_pinned` ### A) Pinned version Datasets: **keyword**: `dataset_omniclustering_pinned` Naming: `[NAME]_pinned` e.g. `Koh_filterExpr_pinned`, `KohTCC_filterHVG_pinned` :::info #### Fixes: 1. Direct the [ExperimentHub cache](https://support.bioconductor.org/p/106349/) towards a directory where you have read and write permissions at the top of the R script: ``` options(EXPERIMENT_HUB_CACHE="/home/rstudio/.cache") ``` 2. Explicitly load the `SingleCellExperiment` library 3. The filtered objects do not contain sizeFactors. In the R script fix the meta data file content to something like: ``` meta <- data.frame("pheno_id" = sce$phenoid, "size_factor" = sce$total_counts, "cell_id" = colnames(sce)) ``` 4. Include logcounts to output files by modifying `src/config.yaml` and `src/run_workflow.py` (see [example](https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/koh-filterexpr-pinned)) 5. (Fixed with omniValidator 0.0.19): ~~Disable checks by omniValidator~~ 6. Generate dataset via __renku cli__, if you get a warning about an existing matching dataset name ```bash= renku dataset create -k DATA_KEYWORD DATA_NAME ``` ::: Each dataset to be done on **both filtering**. - [x] [**Koh**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/koh) --> Almut - [x] [**KohTCC**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/ttc-koh) --> Anthony - [x] [**Kumar**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/kumar) --> Anthony - [x] [**KumarTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ttc-kumar) - [x] [**SimKumar4easy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-easy) --> Mark [DONE] - [x] [**SimKumar4hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-hard) --> Anthony - [x] [**SimKumar8hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-8-hard) --> Anthony - [x] [**Trapnell**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/trapnell) --> Anthony - [x] [**TrapnellTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tcc-trapnell) --> Anthony - [x] [**ZhengMix4eq**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) --> Anthony - [x] **ZhengMix4uneq** (project dropped, copy the content of ZhengMix4eq and change the [main function](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) to `sce_full_Zhengmix4uneq()` ) --> Anthony - [x] [**ZhengMix8eq**](https://renkulab.io/gitlab/omnibenchmark/omni_data/zhengmix8eq) --> Anthony ~~Preprocessing: **keyword**: `preprocessing_omniclustering_pinned`~~ - [ ] ~~[**filterExpression**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/filter-expression)(TBP: scran_1.6.9, scater_1.8.0)~~ - [ ] ~~[**filterM3Drop**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/filter-m3drop) (TBP: scater_1.8.0, M3Drop_1.4.0)~~ Methods: **keyword**: `method_omniclustering_pinned` **Ignore the fix section from the run_workflow.py** - [x] [**ascend**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ascend-clustering) Note: also copy the `install_r.sh` file to the new project. --> Anthony - [x] [**CIDR**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/cidr-clustering) --> Anthony - [x] [**FlowSOM**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/flowsom-clustering) (TBP -> v1.12.0) see also the [parameter](https://hackmd.io/-ZHBAw06SK-oFAxb5tZz6g#New-version) section --> Anthony - [x] [**Monocle**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/monocle-clustering) (TBP -> v2.26.0) --> Anthony - [x] [**PCAHC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/pca-hc) --> Almut - [x] [**PCAKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcakmeans-clustering) --> anthony - [x] [**PCAReduce**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcareduce-clustering) --> anthony - [x] [**RaceID2**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/raceid2-clustering) (TBP -> March 3 2017) --> Anthony - [x] [**RtsneKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/rtsne-kmeans) (TBP -> v.0.13) --> Anthony - [x] [**SAFE**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/safe-clustering) --> Anthony - [x] [**SC3**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/clustering-sc3) (TBP -> v.1.8.0) --> Anthony - [x] [**SC3svm**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/sc3-svm-clustering) (TBP -> v.1.8.0) --> Anthony - [ ] [**Seurat**](https://gitlab.renkulab.io/omni_hackathon/omni_clustering/r-seurat) (TBP -> 2.3.1) see also the [parameter](https://hackmd.io/-ZHBAw06SK-oFAxb5tZz6g#New-version) section --> Mark - [x] [**TSCAN**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tscan-clustering) (TBP -> 1.18.0) --> Anthony Parameters: **keyword**: `params_omniclustering_pinned` - [x] [**clustering parameters**](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/omni-clustering-param) --> Almut Metrics: **keyword**: `metrics_omniclustering_pinned` - [x] [**ARI at true # clusters**]([mclust](https://gitlab.renkulab.io/omnibenchmark/omni_clustering/ari-at-true-n-clust)) --> Anthony - [ ] [**Adjusted Rand index**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/adjusted-rand-index) --> Anthony - [x] [**Shannon Entropy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/shannon-entropy) --> Anthony (TBP) To Be Pinned; these projects have updated versions but need to be downgraded and pinned. See the original [wrappers](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/tree/master/Rscripts/clustering) for the code. <details> <summary> UPDATED VERSION (for latter) </summary> ### B) Updated version *Namespace and projects to come soon...* List of projects and their former links: Datasets: - [ ] [**Koh**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/koh) - [ ] [**KohTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ttc-koh) - [ ] [**Kumar**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/kumar) - [ ] [**KumarTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ttc-kumar) - [ ] [**SimKumar4easy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-easy) - [ ] [**SimKumar4hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-4-hard) - [ ] [**SimKumar8hard**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/sim-kumar-8-hard) - [ ] [**Trapnell**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/trapnell) - [ ] [**TrapnellTCC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tcc-trapnell) - [ ] [**ZhengMix4eq**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) - [ ] **ZhengMix4uneq** (project dropped, copy the content of ZhengMix4eq and change the [main function](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/zhengmix4eq) to `sce_full_Zhengmix4uneq()` ) - [ ] [**ZhengMix8eq**](https://renkulab.io/gitlab/omnibenchmark/omni_data/zhengmix8eq) Preprocessing: - [ ] [**filterExpression**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/filter-expression) - [ ] [**filterM3Drop**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/filter-m3drop) Methods: - [ ] [**ascend**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ascend-clustering) (TBU) - [ ] [**CIDR**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/cidr-clustering) (TBU) - [ ] [**FlowSOM**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/flowsom-clustering) - [ ] [**Monocle**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/monocle-clustering) - [ ] [**PCAHC**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/pca-hc) - [ ] [**PCAKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcakmeans-clustering) - [ ] [**PCAReduce**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/pcareduce-clustering) - [ ] [**RaceID2**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/raceid2-clustering) - [ ] [**RtsneKmeans**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/rtsne-kmeans) - [ ] [**SAFE**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/safe-clustering) (TBU) - [ ] [**SC3**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/clustering-sc3) - [ ] [**SC3svm**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/sc3-svm-clustering) - [ ] [**Seurat**](https://gitlab.renkulab.io/omni_hackathon/omni_clustering/r-seurat) - [ ] [**TSCAN**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/tscan-clustering) Parameters: - [ ] [**clustering parameters**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/omni-clustering-param) Metrics: - [ ] [**ARI at true # clusters**]([mclust](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/ari-at-true-n-clust)) - [ ] [**Adjusted Rand index**](https://renkulab.io/gitlab/omnibenchmark/omni_clustering/adjusted-rand-index) - [ ] [**Shannon Entropy**](https://renkulab.io/gitlab/omni_hackathon/omni_clustering/shannon-entropy) (TBU): To Be Updated. The former renku projects have pinned version and need to be updated to lattest versions. </details> ## Procedure - [x] ~~Create a new project from Renku website. For pinned projects, append a `_pinned` suffix to their name.~~ ~~https://renkulab.io/projects/new?data=eyJuYW1lc3BhY2UiOiJvbWJfYmVuY2htYXJrcy9vbW5pY2x1c3RlcmluZ19waW5uZWQiLCJ2aXNpYmlsaXR5IjoicHVibGljIiwidXJsIjoiaHR0cHM6Ly9naXRodWIuY29tL29tbmliZW5jaG1hcmsvY29udHJpYnV0ZWQtcHJvamVjdC10ZW1wbGF0ZXMiLCJyZWYiOiJDTElfcmVua3V2Ml9SM181In0%3D~~ - [ ] Pick the project from the [terraformed omnibenchmark](https://gitlab.renkulab.io/omb_benchmarks/omniclustering_pinned) - [ ] disable "*Enable shared runners for this project*": Settings > CI/CD > Runners - [ ] Copy the following files from the original project: - `install.R` - files in `src/` - [ ] If the project has to be downgraded (pinned benchmark) or upgraded (updaed benchmark): - `install.R` (for down/upgrades): pin the relevant R package(s) to the old/ new version - `src/[MAIN_SCRIPT].R` (for down/upgrades): test the wrapper on one dataset - `src/config.yaml` modify the `data:`, `script:` and `benchmark_name` sections so that they match to the current project name/ title, main script and benchmark name. - [ ] Double check that the fixed parameters correspond to the ones defined in [Duo's code](https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison/blob/master/Rscripts/parameter_settings/generate_parameter_settings.R#LL135C49-L137C1). For the updated version, choose an equivalent one. - [ ] Make sure that the project is running (`run_workflow.py`) and double-check a few of the outputs. :::warning If you get a warning upon `omni_obj.create_dataset()` about the dataset name already taken. Generate the dataset using the renku cli: ```bash= renku dataset create -k DATA_KEYWORD DATA_NAME ``` ::: - [ ] Add the project to the corresponding orchestrator's CI/CD: - Pinned: https://renkulab.io/gitlab/omb_benchmarks/omniclustering_pinned/orchestrator - Updated: https://renkulab.io/gitlab/omb_benchmarks/omniclustering/orchestrator/-/blob/master/.gitlab-ci.yml - [ ] report any encountered issue into the official documentation, under the 'Common bugs' section: https://github.com/omnibenchmark/documentation/tree/master/docs/04_bugs

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.