# UK Biobank Lab Tests
1. Create a folder called `ukbb_datagen` to run the commands in
2. Place the `ukbb_data_dir_structure.tar.gz` in the `ukbb_datagen` folder
3. In the `ukbb_datagen` folder, run the following commands to setup the directories/raw files required for the datagen
```shell=
sudo mkdir /data/lab_tests
tar -xf ukbb_data_dir_structure.tar.gz
sudo mv interim processed raw /data/lab_tests/
sudo chmod -R 777 /data/lab_tests/
```
4. Replace the sample files in /data/lab_tests/raw/ with the actual files.
`labtests_ukb.icd.sample.csv`
`labtests_ukb.labtests.sample.csv`
`labtests_ukbiobank.labtests.dict.csv`
5. Clonse the mayo repository in the `ukbb_datagen` folder
6. Go to the datagen folder -
`cd [...]/ukbb_datagen/mayo/datagen/ukbb`
7. Create a config.ini file with the following contents. Substitute the sample file names appropriately
```
[matrix]
loc=/data/lab_tests/processed/ukbb/matrix
[labtests]
mapping=/data/lab_tests/processed/test_mapping
[diagnosis]
patient_diagnosis_loc=/data/lab_tests/processed/ukbb/icd
patient_diagnosis_filename=patient_diagnosis_map.npy
icd_mapping_loc=/data/lab_tests/raw
icd_code_mapping=icd10cm_codes_2019.txt
[ukbb]
dir_loc=/data/lab_tests/interim
mayo_link=/data/lab_tests/raw/UKBiobank_to_Mayo.csv
raw_matrix=/data/lab_tests/raw/labtests_ukb.labtests.sample.csv
matrix_cols=bioassays.columns.csv
pat_icd=/data/lab_tests/raw/labtests_ukb.icd.sample.csv
uk_cols_mapping=/data/lab_tests/raw/labtests_ukbiobank.labtests.dict.csv
```
8. Create a virtual environment outside of the project. Recommended to have it in a separate folder
* Create folder ~/venvs
* In that folder, run these commands in order
* `virtualenv -p python3 (environment name)`
* `source (environment name)/bin/activate`
* Navigate to `mayo/core-api`
* Run `pip install -r requirements.pip`
* Navigate to `ukbb_datagen/mayo/datagen/ukbb`
9. run uk_2_sparse.py
`python uk_2_sparse.py`