# Introduction to R - Refactoring the Workshop
Refactor existing workshop for smaller video segments and flipped presentation
1. Download software & run locally
- [R](https://cran.r-project.org/) & [RStudio](https://rstudio.com/products/rstudio/download/) are not the same thing
- [RStudio keyboard Shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts)
- Cloud Options
- https://rstudio.cloud
- https://vm-manage.oit.duke.edu/containers
- R v Python
- R is a _data first_ (i.e. analysis) programing language
- Python is a general programming language
- Are there libraries/packages relevant to your work?
- Is there a supportive community?
- [Rfun](https://rfun.library.duke.edu/)
- [R Community](https://community.rstudio.com/)
- [R Ladies](https://rladies.org/) | Locally, [R Ladies, RTP](https://www.meetup.com/rladies-rtp/)
- ML leans towards Python, [or does it](https://www.tidymodels.org/learn/)?
- Programmer/Coder v Analyst
- [how to choose a programming language](https://medium.com/better-programming/how-to-choose-a-programming-language-for-a-project-7c7a3e5a4de6)
- [how to choose a religion](https://www.wikihow.com/Find-the-Right-Religion-for-You)
1. An RStudio project & reproducibility
- **You are your most frequent collaborator** separated by time
- **A simple test**: Identify specific computational steps from a six-month old project?
- **A simple goal**: Reproduce your computation on a different computer
- Initial **Reproducibility in a nutshell**
- Do everything with a script
- Avoid point & click
- Use relative paths
- Write your code to run on any similar environment
- Read more: [Initial steps toward reproducible research.](https://kbroman.org/steps2rr/) Karl Broman
- **RStudio Projects enable _Reproducibility_**
1. **Relative files paths**
- `read_csv("data_raw/raw_data.csv")`
- ProTip: `..` to move up one level in the directory structure
- Avoid absolute paths
- avoid `setwd()`
- e.g. `setwd("d:/rfiles/myrproject")`
4. **_Restart R and run all chunks_**
- avoid: `rm(list=ls())`
6. R Markdown & **literate coding**
- A script integrates code and natural language
- Explain and describe your analysis within your workflow
- Render reports in multiple formats
- Notebooks, slide decks, web pages, dashboards, e-books, journal articles
7. **File structure** matters
- [_Practice of Reproducible Research_](https://www.practicereproducibleresearch.org/) by Kitzes, Turek, Deniz
```
EXAMPLE File Structure...
project_name (folder)
|-- project_name.Rproj
|-- README.md
|-- license.txt
| data_raw
| |-- raw_data.csv
| |-- README.txt
| data_clean
| code_source
| |--data_cleaning.Rmd
| |--analysis.Rmd
| images
| reports_results
```
3. Get Data & Code Repository
- Access your own data file (e.g. CSV)
- [Download](https://github.com/libjohn/intro2r-code) & Expand a GitHub repository
- Click on *.Rproj
4. Tour of the RStudio environment
- Create a blank project
- Console ; Packages ; Help ; Data ; Environment
- Script Editor & R Markdown
- Switch to your other project (from Section 2)
- [Keyboard Shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts)
1. Tidyverse & other library packages
- Packages extend the functionality of base R into your domain
Practice | Frequency | Command
-------- | --------- | -------
Install | once | `install.packages("tidyverse")`
Load | each time | `library(tidyverse)`
- [**Tidyverse**](https://tidyverse.org):
1. an opinionated collection of packages with consistent web-based documentation and a supportive community
2. a Meta-package that loads 8 helper packages and installs many consistent utilities
Name | Purpose
--- | ---
readr | importing CSV data
dplyr | transforming data
ggplot2 | visuazlizing
tibble | rectangular grid / data frame
tidyr | pivot
forcats | categorical data / factors
stringr | string data / manipulate natural language
purrr | iteration
- **Other package repositories**
- [CRAN](https://cran.r-project.org/web/packages/)
- [Tidyverse Packages](https://www.tidyverse.org/packages/)
- [BioConductor](https://www.bioconductor.org/packages/release/BiocViews.html#___Software)
- [ROpenSci](https://ropensci.org/)
- [MetaCran](https://www.r-pkg.org/)
1. Demo & R Markdown
- Base R, in the console
- A big calculator
- RStudio & Tidyverse
- RMarkdown Notebook: reproducible literate coding = Prose + Coding + Reports + RStudio projects + version control (git/GitHub)
- [R Markdown](https://rmarkdown.rstudio.com/lesson-1.html)
- Code Chunks: Separate prose from code
- [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) (e.g. Jupyter notebooks, R notebooks)
- [R Markdown Cheatsheet](https://rmarkdown.library.duke.edu/slides/index.html#5)
- Integrate a natural language explanation of your analysis along with your code snippets. An approach used within computational sciences to create a functional record of *reproducibile* research. You are your most frequent collaborator, six months from now, or six months ago.
- Create a new, blank R Notebook
- R **Notebook** v. R Markdown **Document[s]**
- Many types of R Reports
- **slides**, documents, web pages, **e-books**, journal articles, web sites, **dashboards**, interactive **HTML widgets**, etc.
- Save the file (`*.Rmd`)
- Preview the file
- (Share with others, so they don't have to recreate your compute environment.)
1. [`dplyr`](https://dplyr.tidyverse.org/) package
- "A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges"
- `mutate()` adds new variables
- `select()` pick variables / columns
- `filter()` subset data by row
- `summarise()` reduces multiple values into a summary
- `count()` a special case of `summarize()` to tally occurances
- `arrange()` sort rows
- [RStudio Keyboard Shortcuts](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts)
1. [`tidyr`](https://tidyr.tidyverse.org/) package
- Make messy data into **tidy data**
- Every variable is a column
- Every row is an observation
- Every cell is a single value
- i.e. **pivots**
- People who like `pivot_longer()` also like `dplyr::left_joint()`
1. Exploratory Data Analysis (EDA): `ggplot2()` & `skimr`
- `skimr::skim()` from library[(skimr)](https://docs.ropensci.org/skimr/)
- ggplot2(): a brief overview of visualization
1. [`ggplot2()`](https://ggplot2.tidyverse.org/): an introduciton to the grammar of graphics, & interactive plots via `plotly`
1. R Markdown
---
20. Large Data
25. Regression
30. Dashboards
35. Slides (Xaringan)
40. Mapping and Geocomputation / Spatial Analysis
45. Version Control: git and GitHub
## Quick Start
1. make a folder
2. drag starwars.csv
3. Make existing folder and RStudio project
4. Open an R Markdown Notebook
5. `library(tidyverses)`
6. `read_csv(starwars.csv)`
7. `ggplot(data = starwars, aes(hair_color)) + geom_bar()`
8. `summary(starwars)`
9. `skimr(starwars)`
10. `left_join(starwars, fivethirtyeight)`
11. `summarize(gender)`
12. Transform data: five dplyr verbs ...
13. `count` / `group_by` & `summarize`
14. Make barchart an interactive ggplotly
15. Quick Linear Regression
16. Save Notebook report, and as MSWord file
## Learning resources
- RStudio Primers
- DataQuest.io
- Master the Tidyverse
- Rfun
- R for Data Science
- Grolemund practical programming
- ggplot2
- Text Mining by Silge/Robinson
- Shiny
- Plotly for R