---
tags: data
---
Author: Yingxiao Yan
Access data --> Explore data --> Prepare data -->Analyze&report data -->export results

# SIMPLER-Västerås clinical (SIMPLER-VC)
## 1. Original data
## 1.1. Inclusion of data - qualitative description
| data type | brief description |
| -------- | -------- |
| Metabolomics|POS, NEG, metadata|
| Clinical variables| |
| diseases| |
| drugs| |
| dxa| very few individuals with metabolomics data |
| family_history2008| |
| clinical quest| |
| questionnaire_data| |
| microbiota data| |
## 1.2. Inclusion of data - quantitative description of origin data
| Metaolomics data | Nu.obs | Nu.vars | Additional comments |
| ------------------------------------------------- | ---------------------- | ------------------- | ----------------------------------------------------------------------------------------------------------- |
| 240607_RP_POS_ClusterBreakdown.xlsx | 5932 | 4 | Additional info on clustered features RP mode (RPC, mz, rt mz_rt) |
|240616_RP_NEG_ClusterBreakdown.xlsx | 4751 | 4 | Info on clustered features RN mode (Additional info on clustered features RN mode (RPC, mz, rt mz_rt) ) |
| **240607_RP_POS_FinalPT.xlsx** | 7721 | 3435 | The first column is SIMPkey |
| **240616_RP_NEG_FinalPT.xlsx** | 7629 | 3414 | Final peak table RN mode |
#cluster_names: expressed as order of cluster@cluster retention time;
#mz: feature m/z value;
| other data group 1 | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|clinical_variables.csv|12808|23| see clinical_history.slsxfor metadata|
|diseases.csv|2472|11 | ??|
|drugs.csv |12586|21| ???? |
|dxa.csv|5022|7|???|
|family_history2008.csv|12819|10| ???|
|pop.csv|694|26| OC|
|pfas.csv|692|9| PFAS|
**#clinicalvariables**
##waist: waist circumference (cm)
##hip: hip circumference (cm)
##sys1: Systolic BP, 1st measure (mmHg)
##diast1: Diastolic BP, 1st measure (mmHg)
##sys2: Systolic BP, 2nd measure (mmHg)
##diast2: Diastolic BP, 2nd measure (mmHg)
##pHDL_Chol: P-HDL_Cholesterol (mmol/L)
##pLDL_Chol: P-LDL_Cholesterol (mmol/L)
##pTrig: P-Triglycerides (mmol/L)
##pALAT: P-ALAT (µkat/L)
##pCRP: P-CRP (mg/L)
##pGlucose: P-Glucose (mmol/L)
##pCreat: P-Creatinine (µmol/L)
##sInsulin: S-Insulin (mE/L)
##INS_mU_l_: insulin, (mU/L). Approx 4400? fresh material, see mitchell 2018 paper
##IGFBP1: Insulin-like growth factor-binding protein 1 (ug/L) frozen material, see mitchell 2018 paper
**#diseases**
##E11: ICD-10 type 2 diabetes
##I63: ICD-20 stroke
##I21: ICD-10 myocardial infraction
##K50: Crohn's disease
##K51: Ulcerative colitis
##BV: before visit
##AV: after visit
**#drugs**
##A10: DRUGS USED IN DIABETES
##C01: CARDIAC THERAPY: for the treatment of cardiovascular conditions.
no C05 06 data
C02,03,07,08,09 are fro hypertension
##C02: ANTIHYPERTENSIVES
##C03: DIURETICS
##C04: PERIPHERAL VASODILATORS
##C07: BETA BLOCKING AGENTS
##C08: CALCIUM CHANNEL BLOCKERS
##C09: AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM
##C10: LIPID MODIFYING AGENTS
##J01: antibacterials for systemic use
##BV: before visit
##AV: after visit
**#dxa (ask Calle)** see mitchell 2018 paper
##fett_total
##lean_total
##fett_android
##fett_gynoid
##height
##weight
**#family_history2008**
##For mother, father, sibling and relatives, DK is don't know
##C_fhx01: breast cancer
##C_fhx02: colon cancer
##C_fhx03: prostate cancer
##C_fhx04: other cancer
##C_fhx05: heart attack before age 60
##C_fhx06: rheumatoid arthritis
##C_fhx07: psoriasis
##C_fhx08: diabetes
##C_fhx09: hypertension
| other data group 2 | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|clinical_quest.csv|12792|239| 231 dietary variables, 5 other variables|
|questionnaire_data.csv|12819|85| 81 questionnaire variables, 4 other variables
Cycle 1=first visit (i.e. SMCC Uppsala 2003-2009), cycle 2=second visit (i.e. SMCC Uppsala 2015-2023)
**#clinical quest data**
##simpkey: unique
##sex
##age
##plats: Stockholm 107, Uppsala 5320, Västerås 7365
##place: clinical 12401 Home 391
##dateofbirth
##cycle: S, U1A, U1B, U1C, U1D, UH, V
U1A=ffq1997 462
U1B=ffq 2004 V1 4348
U1C=ffq 2004 V2 148
U1D=2009 lifestyle 58
UH= sampled at home in Uppsala 304
S=sampled in Stockholm 107
V1=Västerås cycle 1 7365
##qyear: which year they answered the questionnaire.
Please refer to the subcohort variable list to check what each food variable means. The original variables are in the format g_food001, g_bev02, g_alc03. The derived food variables are in
There are many NAs in the variable list. The NA values needs to be removed when running the MUVR model
**#questionnaire data**
##A stands for 1987
##B stands for 1997
##C stand for 2008 health
##D stands for 2009 lifestyle
##E stands for 2019 health
##F stands for 2019 lifesyle
##simpkey : unique
##birthdate (1914-1952)
##deathdate (2005-2023)
##visitdate (2003-2019)
| Variables |Type(should be) |Explanation|
| -------- | -------- |-------- |
| A_age| numeric|Age when answering questionnaire (Q87=SMC Baseline, Q97=COSM Baseline)|
| B_age| numeric|Age when answering questionnaire (Q87=SMC Baseline, Q97=COSM Baseline)|
|C_age |numeric|Age when answering questionnaire (Q87=SMC Baseline, Q97=COSM Baseline)|
|D_age |numeric|Age when answering questionnaire (Q87=SMC Baseline, Q97=COSM Baseline)|
|E_age |numeric|Age when answering questionnaire (Q87=SMC Baseline, Q97=COSM Baseline)|
|C_health|factor|How is your health? 1=Very good, 2=Good, 3=Neither good/ nor bad, 4=Bad, 5=Very bad |
|E_health |factor|How is your health? 1=Very good, 2=Good, 3=Neither good/ nor bad, 4=Bad, 5=Very bad|
|B_der_height|numeric|B_height2:Height or if 1997 value is missing the 1987 value will be given|
|B_der_weight|numeric| B_weight1:Weight 1997 or if 1997 value is missing the 1987 value will be given|
|B_der_bmi |numeric|Weight/(Height)^2:BMI|
|C_bp1 |factor|If no BP medication: Has your blood pressure been measured in the past 3 years? 1=No, 2=yes|
|C_bp2 |factor|If BP measured - what result? 1=Too low, 2=Normal, 3=Slightly elevated, 4=Markedly elevated|
|C_waist |numeric|Waist (cm)|
|C_hip |numeric|Hip (cm)|
| A_eat01 |factor |Type of diet. '1=Omnivorous, 2=Only lactovegetarian (no meat, fish or egg), 3=Mostly lactovegetarian, sometimes eats fish and eggs, 4=Vegan, 5=Other|
|D_eat02 |factor|What is your main type of diet? '1=Mixed, 2=Vegetarian, 3=Vegan|
| A_educ1 |factor |Education, 1, 2, 3=Primary school (≤ 9 years), 4=High school (10-12 years), 5=College/University (≥12 years), 6=Other|
|B_der_educ1 |factor|B_educ2-B_educ7:Highest education-level 1=Primary school <=9 years, 2=High school 10-12 years, 3=University >12 years|
| B_smok01_u|factor |Ever smoked cigarettes regularly 1=Yes, 0=No |
| B_smok03_n|factor |Still smoking, 1=Yes, 0=No |
|B_der_smok01 |factor|B_smok01:Ever smoker 1=Yes, 0=No|
|B_der_smok03 |factor|B_smok03:Smoking status 1=Yes, 0=No|
|D_smok01_u |factor|Ever smoked cigarettes regularly. 1=No, 2=Yes, currently, 3=Yes, but I stopped|
|B_der_smok_u |factor|B_smok01+B_smok03:Ever smoker/status 1=Current, 2=Ex, 3=Never|
|B_der_alc09_u |factor|????It is B_der_alc14_u in my variable list. B_alc14_u+B_alc15_u:Ever drinker 0=Never, 1=Ex, 2=Current|
|B_der_act25 |numeric|Current total activity score (MET*hours/d)|
|D_act20 |factor|Exercise, this year 1= Almost never, 2=<1 hour/week, 3=1 hour/week, 4=2-3 hours/week, 5=4-5 hours/week, 6=>5 hours/week|
| B_med06_u| factor|Ever used cortisone tablets, 1=No, 2=Yes||
|C_med08_u|factor|Ever used cortisone tablets 1=No, 2=Yes|
|C_med09_x |factor|Diabetes, treatment. 1=Insulin, 2=Tablets, 3=Dietary advice, 9=More than one|
|C_med010_u |factor|Have you used anitbiotics during the past 10 years? 1=No, 2=Yes|
|C_fhx05_no |factor|Relatives diagnosed with heart attack before age 60: no. 1=No|
|C_fhx05_fm |factor|Heart attack before age 60: mother. 1=Yes|
|C_fhx05_ff |factor|Heart attack before age 60: father. 1=Yes|
|C_fhx05_fs |factor|Heart attack before age 60: sibling. 1=Yes|
|C_fhx05_dk |factor|Relatives diagnosed with heart attack before age 60: don't know. 1=Don't know|
|C_fhx08_no |factor|Relatives diagnosed with diabetes: no. 1=No|
|C_fhx08_fm |factor|Diabetes: mother. 1=Yes|
|C_fhx08_ff |factor|Diabetes: father. 1=Yes|
|C_fhx08_fs |factor|Diabetes: sibling. 1=Yes|
|C_fhx08_dk |factor|Relatives diagnosed with diabetes: dont know. 1=Don't know|
|C_fhx09_no |factor|Relatives diagnosed with hypertensions: no. 1=No|
|C_fhx09_fm |factor|Hypertension: mother. 1=Yes|
|C_fhx09_ff |factor|Hypertension: father. 1=Yes|
|C_fhx09_fs |factor|Hypertension: sibling. 1=Yes|
|C_fhx09_dk |factor|Relatives diagnosed with hypertensions: dont know. 1=Don't know|
|B_der_diag02_x|factor|B_diag02_x+B_diag02_y:Diag high blood pressure 1=Yes, 0=No|
|B_der_diag03_x|factor| B_diag03_x+B_diag03_y:Diag high cholesterol 1=Yes, 0=No|
|B_der_diag06_x|factor|B_diag06_x+B_diag06_y:Diag heart attack 1=Yes, 0=No|
|B_der_diag07_x|factor| B_diag07_x+B_diag07_y:Diag stroke 1=Yes, 0=No|
|B_der_diag08_x |factor|B_diag08_x+B_diag08_y:Diag diabetes 1=Yes, 0=No|
| B_diag02_x| factor|Ever diagnosed with: Hypertension 1=Yes, 0=No|
| B_diag02_a| numeric |Hypertension: at what age|
|B_diag02_y1 |numeric|Hypertension: what year|
| B_diag03_x| factor|Ever diagnosed with: High cholesterol 1=Yes, 0=No |
| B_diag03_a|numeric |High cholesterol: at what age|
|B_diag03_y1 |numeric|High cholesterol: what year|
| B_diag05_x| factor |Ever diagnosed with: Angina pectoris 1=Yes, 0=No|
| B_diag05_a| numeric|Angina pectoris: at what age|
|B_diag05_y1 |numeric|Angina pectoris: what year|
| B_diag06_x| factor |Ever diagnosed with: Heart attack 1=Yes, 0=No |
| B_diag06_a| numeric|Heart attack: at what age|
|B_diag06_y1 |numeric|Heart attack: what year|
| B_diag07_x| factor |Ever diagnosed with: Stroke 1=Yes, 0=No|
| B_diag07_a| numeric|Stroke: at what age|
|B_diag07_y1 |numeric|Stroke: what year|
| B_diag08_x| factor|Ever diagnosed with: Diabetes 1=Yes, 0=No|
| B_diag08_a|numeric |Diabetes: what year|
|B_diag08_y1 |numeric|Diabetes: at what age|
|C_diag02_x |factor |Ever diagnosed with: Hypertension, 1=Yes, 0=No|
|C_diag03_x|factor | Ever diagnosed with: High cholesterol, 1=Yes, 0=No|
|C_diag05_x |factor |Ever diagnosed with: Angina pectoris , 1=Yes, 0=No|
|C_diag16_x |factor |Heart failure 1=Yes|
|C_diag08_x |factor |Ever diagnosed with: Diabetes , 1=Yes, 0=No|
|C_diag08_a |numeric|Diabetes: at what age, 1=Yes, 0=No|
|E_diag02_x|factor |Ever diagnosed with: Hypertension 1=Yes, 0=No|
|E_diag03_x |factor | Ever diagnosed with: High cholesterol 1=Yes, 0=No|
|E_diag05_x |factor | Ever diagnosed with: Angina pectoris 1=Yes, 0=No|
|E_diag16_x |factor |Heart failure 1=Yes|
|E_diag08_x |factor |Ever diagnosed with: Diabetes 1=Yes, 0=No|
|E_diag08_a |numeric|Diabetes: at what age|
| other data group 3 -microbiota | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|simpler_metagenomics_metaphlanalpha_diversity_v2.0.tsv|12792|239| 231 dietary variables, 8 other variables|
|simpler_metagenomics_metaphlan_bray_curtis_dissimilarities_v2.0.tsv|12819|85| 81 questionnaire variables, 4 other variables|
|simpler_metagenomics_metaphlan_bray_curtis_dissimilarities_v2.0.tsv|12792|239| 231 dietary variables, 8 other variables|
|technical_variables.tsv|12819|85| 81 questionnaire variables, 4 other variables|
## 2. round 1 - R document work flow
The metabolomics data is in:
Positive`/proj/simp2023018/240607_COSMC_POS/240517-COSMC-POS/results/pipe4/`
Negative `/proj/simp2023018/240617-COSMC-NEG/results/pipe4/`
The other data group 1 is in
`/castor/project/proj/Dataleverans/`
All the scripts are in
`/castor/project/proj/Yan_threecohort/`
workflow:
get_rawdata_VC.R -->
clean_data_VC.R -->microbiota_explore, foodvariable_explore.R-->
match_data_VC.R -->
confounder_imputation_VC.R-->confounder_build_VC.R-->
preclean_analysis_VC.R -->
outcome_r_f_s_VC.R-->
exposure_outcome_r_f_s_VC.R-->
after_O_E_features_selected_VC.R-->
All the intermediate and final processing result is in:
`/castor/project/proj/Yan_threecohort/data_preprocessing_results/`
### 2.1 get_rawdata
Input data file is:
```
240607_RP_POS_ClusterBreakdown.xlsx
240616_RP_NEG_ClusterBreakdown.xlsx
240607_RP_POS_FinalPT.xlsx
240616_RP_NEG_FinalPT.xlsx
clinical_variables.csv
diseases.csv
drugs.csv
dxa.csv
family_history2008
clinical_quest.csv
questionnaire_data.csv
```
The script is
`get_rawdata_VC.R`
The output is
`rawdata_tobecleaned_VC.rda`
which contains
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|clinical_variables|12808|23| |
|diseases|2317| 7 | |
|drugs |11020|19| |
|dxa|5022|7| |
|family_history2008|12819|10| |
|clinical_quest|12792|239||
|questionnaire_data|12815|85||
| metabolomics_POS_clusterbreakdown | 5932 | 4 | Additional info on clustered features RP mode (RPC, mz, rt mz_rt) |
|metabolomics_NEG_clusterbreakdown | 4751 | 4 | Info on clustered features RN mode (Additional info on clustered features RN mode (RPC, mz, rt mz_rt) ) |
|metabolomics_POS | 7721 | 3435 | The first column is SIMPkey |
| metabolomics_NEG | 7629 | 3414 | Final peak table RN mode |
| metabolomics_VC | 7626 | 6848 | Final peak table RN mode |
### 2.2 clean_data
Input data file is:
`rawdata_tobecleaned_VC.rda`
The script is
`clean_data_VC.R`
The output is
`cleanedata_tobematched_VC.rda`
which contains
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|clinical_variables|12808|27| |
|diseases|2317| 16 | |
|drugs |11020|19| |
|dxa|5022|7| |
|family_history2008|12819|10| |
|clinical_quest|12792|239|231 food variables and 13 other variables|
|questionnaire_data|12815|85|81 variables covariate, 4 other variables|
|original_dietvar|12792|186|original variables|
|derived_dietvar|12792|61|derived variables|
|questionnaire_data_whole|12819|73|selected covariate variables|
|questionnaire_data_lean|12819|43|redundant variables removed or integrated|
|metabolomics_POS | 7721 | 3435 | The first column is SIMPkey |
| metabolomics_NEG | 7629 | 3414 | Final peak table RN mode |
| metabolomics_VC | 7626 | 6848 | Final peak table RN mode |
|list | Nu.items | item_type |Additonal comment |
| -------- | -------- |-------- |-------- |
|clinical_variables_check|3|mix| |
|diseases_check|3| mix | |
|drugs_check |3|mix| |
|dxa_check|3|mix| |
|family_history2008_check|3|mix| |
|clinical_quest_check|3|mix||
|original_dietvar_check|3|mix||
|derived_dietvar_check|3|mix||
|questionnaire_check|3|mix||
|questionnaire_whole_check|3|mix||
|questionnaire_lean_check|3|mix||
## 2.3 microbiota and food variables explore
### 2.3.1 microbiota_explore
Thsi step first remove near zero variance variables, then exclude variables with 0 prevalance >2/3
Input data file is:
`simpler_metagenomics_metaphlanalpha_diversity_v2.0.tsv`
`simpler_metagenomics_metaphlan_bray_curtis_dissimilarities_v2.0.tsv`
`simpler_metagenomics_metaphlan_bray_curtis_dissimilarities_v2.0.tsv`
`technical_variables.tsv`
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|alpha diversity.tsv|6150|3| |
|beta_diversity.tsv|6150|6151| |
|metaphlan_abundance.tsv|6150|7852| |
|technical_variables.tsv|6150|23| |
The script is
`microbiota_explore.R`
The output is
`micro_precessed.rda`
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|micro_variables_summarize|12792|239||
|micro_all|6150|7885| |
|micro_all_nonzv|6150|1753| |
|micro_lessmissing|6150|976| in clusing simpkey, alpha diversity variables|
|list | Nu.items | item_type |Additonal comment |
| -------- | -------- |-------- |-------- |
|micro_all_check|3|mix|All microbiota variables |
|micro_all_nonzv_check|3| mix | all microbiota variables that do not have near zero variance|
|micro_abundance_list|12819|85|k__, p__, c__, p__ , f__, g__, s__, t__ by group|
### 2.3.3 foodvariable_explore
This step first remove near zero variance variables, then exclude variables with NA values >2/3
Input data file is:
`cleanedata_tobematched_VC.rda`
The script is
`foodvariable_explore.R`
The output is
`food_variable_result.rda`
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|original_dietvar|12792|143| 8 of them are other variables|
|original_dietvar_nonzv|12792|175| 8 of them are other variables|
|original_dietvar_all|12792|191|8 of them are other variables|
|derived_dietvar|12792|56| |
|derived_dietvar_nonzv|12792|56| |
|derived_dietvar_all|12792|56| |
|list | Nu.items | item_type |Additonal comment |
| -------- | -------- |-------- |-------- |
|original_dietvar_all_check|3|mix|all original diet_var without near zero variances |
|derived_dietvar_all_check|3| mix |all derived diet_var without near zero variances |
|original_dietvar_nonzv_check|3|mix| all original variables|
|derived_dietva_nonzv_check|3| mix |all derived dietary variables|
|questionnaire_data_whole_check|3| mix | questionnaire data check |
### 2.4 match_data
Input data file is:
`cleanedata_tobematched_VC.rda`
`micro_precessed.rda`
`food_variable_result.rda`
The script is
`match_data_VC.R`
The output is
`matcheddata_tobeanalyzed_VC.rda`
|dataframe for variables | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|clinical_variables_match_metabolomics_VC|4898|25| |
|diseases_match_metabolomics_VC|654| 16 | |
|drugs_match_metabolomics |4160|19| |
|family_history2008_match_metabolomics|4898|10| |
|pop_match_metabolomics|681|26||
|pfas_match_metabolomics|679|9||
|pop_normalized_match_metabolomics|681|26||
|pfas_normalized_match_metabolomics|679|9||
|original_dietvar_match_metabolomics|4882|186||
|derived_dietvar_match_metabolomics|4882|61||
|questionnaire_data_whole_match_metabolomics|4898|73||
|questionnaire_data_lean_match_metabolomics|4898|43||
|dataframe for metabolomics | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|metabolomics|4898|1698| |
|metabolomics_match_clinical_variables|4898|1698| |
|metabolomics_match_diseases|654| 1698 | |
|metabolomics_match_drugs|4160|1698| |
|metabolomics_match_family_history2008|4898|1698| |
|metabolomics_match_pop|681|1698||
|metabolomics_match_pfas|679|1698||
|metabolomics_match_pop_normalized|681|1698||
|metabolomics_match_pfas_normalized|679|1698||
|metabolomics_match_derived_dietvar|4882|1698||
|metabolomics_match_original_dietvar|4882|1698||
|metabolomics_match_questionnaire_data_whole|4898|1698||
|metabolomics_match_questionnaire_data_lean|4898|1698||
### 2.5 impute confounders
Input data file is:
`cleanedata_tobematched_VC.rda`
`matcheddata_tobeanalyzed_TessaID_VC.rda`
The script is
`confounder_imputation_VC.R`
The output is
`confounder_raw_nomiss_imputed_VC.rda`
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|confouder_raw|12781|9|Only rows with matched simpkeys for all confounder variables are used|
|confounder_raw_extend|12819|9|Include those that does not have matched simpkeys for some confounders as NA|
|confounder_nomiss|10260|9| Remove any rows that has missing values|
|confounder_imputed||| The imputed result from confounder_raw_extend|
Input data file is:
`confounder_raw_nomiss_imputed_VC.rda`
The script is
`confounder_build_TessaID_VC.R`
The output is
`confounders_frame_TessaID_VC.rda`
|dataframe | Nu.obs | Nu.vars | Additional comments |
| -------- | -------- |-------- |-------- |
|confouder_MI_TessaID|||confoudner with NA|
|confouder_imputed_MI_TessaID|||imputed result|
|confouder_stroke_TessaID|||confoudner with NA|
|confouder_imputed_stroke_TessaID|||imputed result|
### 2.6 preclean_analysis
This is to clean up some variables, dealing with uneasy .
Separate variables to baseline and follow up
Input data file is:
`matcheddata_tobeanalyzed_VC.rda`
`confounder_raw_nomiss_imputed_VC.rda`
The script is
`preclean_analysis_VC.R`
The output is
`ready_for_analysis_VC.rda`
Currently, the script is running MUVR models to select variables that are associated with T2D, MI and stroke.
Only individuals without CVD history is used for MUVR models for MI and stroke. Only individuals without T2D history is used for MUVR models for T2D.
The output contain MUVR_RF_mod, MUVR_PLS_mod,MUVR_EN_mod,
MUVR_EN_adj_mod (adjusting for age). Each item is a list of three (T2D, MI, stroke). Each sublist is a list of 5 MUVR model results, when subsampling the controls to the same number as cases
### 2.7 Outcome_r_f_s_
Input data file is from 2.5:
`ready_for_analysis_VC.rda`
The script is
`outcome_r_f_s__VC.R`
To check ber nVar and what variables are selected for 3 out of 5 times of using different subsampling of cases.
The output is
`outcome_related_features_result_VC.rda`

The intended used result could be `T2D_variable_RF_mid`, `MI_variable_RF_mid`, `stroke_variable_RF_mid`.
### 2.8 Exposure_Outcome_r_f_s_
Input data file is:
`outcome_related_features_result.rda`
`matcheddata_tobeanalyzed.rda`
The script is
`exposure_outcome_r_f_s.R`
The output is
`exposure_outcome_related_features_result.rda`
### 2.9 after_E_O_features_selected
Input data file is:
`exposure_outcome_related_features_result.rda`
`matcheddata_tobeanalyzed.rda`
The script is
`after_E_O_features_selected.R`
The output is:
## 3. SIMPLER-VC data cleaning
### 3.1 get_rawdata_VC
### 3.2 clean_data_VC
### 3.3 normalized_OCmatch_data_VC
### 3.4 microbiota_explore
### 3.4.1 food_variable_explore
### 3.5 match_data
### 3.6 preclean_Analysis
### 3.7 Outcome_r_f_s_
## 4. match omics
### 4.1 match selected outcome-related metabolomics features in simpler-UC with simpler-VC
Check for overlap and get selected features
### 4.2 linear models to check SIMPLER-UC features in SIMPLER-VC, and vice versa
### 4.3 same as Exposure_Outcome_r_f_s_ in SIMPLER-UC
### 4.4 match seleced outcome-related metabolomics with simpler-VC with other cohort
Check for overlap and get selected
### 4.5 linear models to check SIMPLER-UC features in other cohorts