## (how not to accidentally delete your files)
Written by Sarah Teichman for STAMPS 2023.
### File organization
There are a lot of different ways to organize your files. This is a method that works for me, but you might have a different method that works better for you!
Imagine I'm running a data analysis. I will start with one overall folder **data-analysis**. My **data-analysis** folder will have several subfolders:
* **data**: this will hold my data.
* For example, this may hold separate metadata and OTU data frames as well as a `phyloseq` object that I made from that data.
* **scripts**: this will hold any scripts (R or shell or python or ...) that I use to analyze my data.
* For example, the code that I use to generate my `phyloseq` object would go here.
* **figures**: this will hold any figures that I generate during my analysis.
* For example, if I make a PCoA plot from my `phyloseq` object I would save it here.
* **output**: this will hold any output that I want to save from my analysis.
* For example, if I calculated a dissimilarity matrix between each of my samples and wanted to export it to use in another program I would save it here.
* (I only sometimes use this subfolder.)

Here's an example of my file organization for a data analysis of phylogenetic trees!
### RStudio projects
Imagine you have two data analyses you are working on. One day you are working on analyzing the diversity of algae in different oceans :ocean:, the next day you want to study the differential abundance of bacteria in soil :seedling: (you have a broad range of interests).
One problem... each time you switch back and forth between the ocean algae and soil bacteria you have to switch your working directory in R. What a pain :unamused:!
Enter: RStudio projects!!!!
What is a RStudio project?
* A way to keep all the files for each analysis together
* A way to set your **data-analysis** folder as the working directory for each different analysis
How do I open a RStudio project?
* File -> New Project
* Either start:
* in a new directory (set up **data-analysis** folder and subfolders from scratch)
* in an existing directory (you already have your **data-analysis** folder and subfolders and want to create an associated RStudio project)
* Version Control (we'll return to this soon!)
How do I move between RStudio projects?
* Simply close one project and open another
* No need to change working directories, the working directory is always the folder that is associated with your project
Why would I use RStudio projects?
* It will *almost certainly* make your file organization in R easier and less painful :relaxed:!
### R and version control via GitHub
Do your files look like this?
* final_report.pdf
* final_report1.pdf
* final_report8.pdf
* final_report-09-25.pdf
* final_report-09-28.pdf
* final_report_for_real_this_time.pdf
**You could probably use version control!**
Do you find that you and your collaborators are constantly sending emails back and forth with updated versions of code that you are jointly working on?
**You could probably use version control!**
**Version control** is the practice of tracking and recording any changes to code or software. It creates a history of changes and lets users access older versions of code. It also lets multiple people work on the same project and always have access to the most up-to-date version.
A very common version control system is called **git** and it is often used through the web-based hosting site called **GitHub** (you can think of them like R and RStudio, we interact with git via GitHub).
A few things to know about GitHub:
* A "repository" or "repo" in GitHub is essentially your top level **data-analysis** folder. It should be self-contained. Each repo will be tracked via version control.
* git and GitHub play really well with RStudio!
Remember when I told you that you could choose to create a new RStudio project with the option "Version Control"? This lets you link up a GitHub repository with an RStudio project. This is a seamless (well, kinda) way to integrate version control into your analyses!
We won't go through setting up GitHub and getting ready to use it with RStudio in this course, but here are some resources for you to get started if you'd like to.
* [Notes](https://svteichman.github.io/STAT302-SPR2022/files/slides/08_debuggit.html#42) on using GitHub through RStudio from a course that I taught (much of the material borrowed from Bryan Martin, a previous STAMPS TA :tada:).
* [More comprehensive notes](https://happygitwithr.com/).