# Step to run the CELLECT S-LDSC pipeline
## Step 1: Install packages and environments
### Step 1A: Clone CELLECT GitHub repo
> Comment: The --recurse-submodules is needed to clone the git submodule 'ldsc' (pascaltimshel/ldsc), which is a modfied version of the original ldsc repository. (Cloning the repo might take few minutes as the CELLECT data files (> 1-3 GB) will be downloaded. To skip downloading the data files, use GIT_LFS_SKIP_SMUDGE=1 git clone --recurse-submodules https://github.com/perslab/CELLECT.git instead.)
```
git clone --recurse-submodules https://github.com/perslab/CELLECT.git
use Git-LFS
git lfs fetch
git lfs checkout
```
### Step 1B: Create a conda env for CELLECT
> Note: Make sure to create and download conda envs on a VM (qrsh) with VM >4G.
```
conda create --prefix /broad/mcl/members_dir/mmurali/envs/cellect_env
conda activate /broad/mcl/members_dir/mmurali/envs/cellect_env
```
### Step 1C: Install snakemake via conda
```
conda install -c conda-forge mamba
mamba create -c conda-forge -c bioconda -n snakemake snakemake
# To use snakemake, use the following command
conda activate /broad/mcl/members_dir/mmurali/envs/cellect_env/envs/snakemake
```
### Step 1D: Clone CELLEX GitHub repo
```
git clone https://github.com/perslab/CELLEX.git --branch develop --single-branch
cd CELLEX
```
### Step 1E: Create a conda env for CELLEX
> Note: Make sure to create and download conda envs on a VM (qrsh) with VM >4G.
```
conda create --prefix /broad/mcl/members_dir/mmurali/envs/cellex_env
conda activate /broad/mcl/members_dir/mmurali/envs/cellex_env
```
### Step 1F: Install from source using pip
```sh
# Alternatively install latest version using PyPi
pip install cellex
# Error: AttributeError: module 'numpy' has no attribute 'float'.
conda install python=3.11
python -m pip install numpy==1.23.3
# in python3
import pandas as pd
import numpy as np
import cellex
```
---
## Step 2: Follow the CELLECT LDSC Tutorial ([link](https://github.com/perslab/CELLECT/wiki/CELLECT-LDSC-Tutorial))
### Step 2A: Create conda environment for munging
```
cd ~/CELLECT
conda env create -f ldsc/environment_munge_ldsc.yml --prefix /broad/mcl/members_dir/mmurali/envs/munge_ldsc
conda activate /broad/mcl/members_dir/mmurali/envs/munge_ldsc
```
### Step 2B: Download GWAS sumstats
> BMI GWAS from [Yengo (HMG, 2018)](https://academic.oup.com/hmg/article/27/20/3641/5067845?login=true) and Educational Attainment GWAS from [Lee (Nat. Gen., 2018)](https://www.nature.com/articles/s41588-018-0147-3).
```
wget https://portals.broadinstitute.org/collaboration/giant/images/c/c8/Meta-analysis_Locke_et_al%2BUKBiobank_2018_UPDATED.txt.gz -P example/
wget https://www.dropbox.com/s/ho58e9jmytmpaf8/GWAS_EA_excl23andMe.txt -P example/
```
### Step 2C: Munge the GWAS sumstats
> Note: Re-download the `w_hm4.snplist` as the file doesnt get downloaded because of the size limit.
```
python ldsc/mtag_munge.py \
--sumstats example/GWAS_EA_excl23andMe.txt \
--merge-alleles data/ldsc/w_hm3.snplist \
--n-value 766345 \
--keep-pval \
--p PVAL \
--out example/EA3_Lee2018
python ldsc/mtag_munge.py \
--sumstats example/Meta-analysis_Locke_et_al+UKBiobank_2018_UPDATED.txt.gz \
--a1 Tested_Allele \
--a2 Other_Allele \
--merge-alleles data/ldsc/w_hm3.snplist \
--keep-pval \
--p PVAL \
--out example/BMI_Yengo2018
```
### Step 2D: Generate cell-type specificity input using CELLEX
> If you get error with numpy, do the following:
python -m pip install numpy==1.22.4
```
# Deactivate munge_ldsc
conda deactivate
# Activate cellex_env
conda activate /broad/mcl/members_dir/mmurali/envs/cellex_env
import numpy as np
import pandas as pd
import cellex
# Load input data and metadata
mousebrain_sc_rnaseq_data=pd.read_csv("abc.csv", index_col=0)
celltype_labels=pd.read_csv("xyz.csv", index_col=0)
# Create ESObject and compute ESmu
eso = cellex.ESObject(data=mousebrain_sc_rnaseq_data, annotation=celltype_labels, verbose=True)
eso.compute(verbose=True)
# View Expression Specificity scores
eso.results["esmu"]
eso.results["esmu"].to_csv("mousebrain-test.csv.gz")
```
### Step 2E: Run CELLECT-LDSC
```
# Activate the snakemake env
conda create --prefix /broad/mcl/members_dir/mmurali/envs/snakemake
conda activate /broad/mcl/members_dir/mmurali/envs/snakemake
conda config --add channels bioconda
conda config --add channels conda-forge
conda install snakemake
# Within the CELLECT directory
snakemake --use-conda -s cellect-ldsc.snakefile --configfile config.yml --cores 20 -j 50 --conda-frontend conda
```
Update: This ran but didnt produce an output.
## Citations:
1) Finucane (Nature Genetics, 2015): Partitioning heritability by functional annotation using genome-wide association summary statistics
2) Timshel (eLife, 2020): Genetic mapping of etiologic brain cell types for obesity