SOP - Files structuring

# SOP - Files structuring Convention on organizing data files in folders 1. "Data" - "Raw_data": In-data/raw data (without any modification, don't tamper with it) - "Intermediate_data": Intermediate data (after some modifications, e.g., imputation of missing values, variable modification from numeric to factor, etc) #Important: - Codebook for each data file: description for each variable, data type (numeric, factor, integer, etc), units, number of data points (give an immediate glance on the number of missing values, imputation method of missing values) - see check-in SOP here (link - ELise) - see template for codebook here (link - Yan? https://docs.google.com/spreadsheets/d/1wlr6TJwUqLq5AZNQUcnvG0kd2rTsVp5o/edit#gid=927871402) - convention on the variable name here (link) - keep this "Data" folder separately so that it will not be transferred e.g., from BIANCA environment for sensitive information 2. "Output": to store the results of the analyses - the content of this folder can be organized differently for each project, e.g., based on the analyses performed, datasets, or other structures - depending on your preference, you can also put file versioning in the file name, e.g., based on date in YYMMDD format (to make the sorting easier) #Important: - write a readme file after completion of the project, to guide the "reader" or any collaborator invited to the project to jump into the project faster - see the SOP on checkout for project transfer here (link - Elise) 3. "R_scripts": keep all R scripts in a separate folder - keep only one function per .R file - to ease transfer of R scripts from one project to another - see convention on script standardization here (link - Viktor) # PS: creativity and variation are allowed within the FAIR principles Version 1.1 - Stef (220610)