# SOP - Files structuring
Convention on organizing data files in folders
1. "Data"
- "Raw_data": In-data/raw data (without any modification, don't tamper with it)
- "Intermediate_data": Intermediate data (after some modifications, e.g., imputation of missing values, variable modification from numeric to factor, etc)
#Important:
- Codebook for each data file: description for each variable, data type (numeric, factor, integer, etc), units, number of data points (give an immediate glance on the number of missing values, imputation method of missing values)
- see check-in SOP here (link - ELise)
- see template for codebook here (link - Yan? https://docs.google.com/spreadsheets/d/1wlr6TJwUqLq5AZNQUcnvG0kd2rTsVp5o/edit#gid=927871402)
- convention on the variable name here (link)
- keep this "Data" folder separately so that it will not be transferred e.g., from BIANCA environment for sensitive information
2. "Output": to store the results of the analyses
- the content of this folder can be organized differently for each project, e.g., based on the analyses performed, datasets, or other structures
- depending on your preference, you can also put file versioning in the file name, e.g., based on date in YYMMDD format (to make the sorting easier)
#Important:
- write a readme file after completion of the project, to guide the "reader" or any collaborator invited to the project to jump into the project faster
- see the SOP on checkout for project transfer here (link - Elise)
3. "R_scripts": keep all R scripts in a separate folder
- keep only one function per .R file
- to ease transfer of R scripts from one project to another
- see convention on script standardization here (link - Viktor)
# PS: creativity and variation are allowed within the FAIR principles
Version 1.1 - Stef (220610)