--- tags: math615 robots: noindex, nofollow --- # Collaborative R Notes This set of notes is for students at Chico State to help build a reference for themselves and future users on how to do tasks in R. # General resources * https://norcalbiostat.github.io/MATH130/faq.html * https://datacarpentry.org/R-genomics/index.html # Script Files * A script file with a `.R` file type is mainly R, all discussion/comments need to be in #comment form * An **Rmarkdown** (`.Rmd`) file is like a lab notebook where you can write R code, * See [Math 130 Week 1 lesson 03](https://norcalbiostat.github.io/MATH130/wk1.html) or the [RStudio tutorial page](https://rmarkdown.rstudio.com/lesson-1.html) for an overview. * A [Quarto](https://quarto.org/) (`.qmd`) file is the new generation of Rmarkdown files. :grinning_face_with_star_eyes: See [Hello Quarto](https://quarto.org/docs/get-started/hello/rstudio.html) tutorial to get started # Packages / Libraries A _package_ is a collection of functions that can be used to do things like create graphics or run advanced models. To get access to these functions you have to _load_ the package by typing ```{r} library(packagename) ``` where `packagename` is the name of the package you are trying to load. Examples include `tidyverse`, `here` and `janitor`. # Import and preprocessing * External data files are often saved as text files (`.txt`), comma separated values (`.csv`) or as Excel files (`.xlsx`) ## Import Data into R If my data set is named **AddHealth_Wave_IV.csv** and it is located in my data folder, then it can be read into R using the `read_csv` function. ```{r} raw <- read_csv(here("data", "AddHealth_Wave_IV.csv")) ``` Here I am assigning the data set to the name `raw`. ## Only read in selected variables ```{r} mydata <- raw %>% select(BIO_SEX, H4TO1, H4TO2) ``` ## check the data type ```{r} class(mydata$H4TO1) ``` ## Export R formatted data ```{r} save(mydata, here("data", "addhealth_clean.Rdata")) ``` ## Nice settings for your rendered document At the top of your `qmd` file, in the YAML header add the following ```md title: "Describing Relationships between variables" date: "2022-09-19" author: "Robin Donatello" format: pdf execute: echo: true ``` ## Hide warning messages #### For a single code chunk: Inside the code chunk, at the top, use the hash-pipe to set code chunk options ```md execute: echo: true warning: false message: false ``` #### For the entire document: At the top of your `qmd` file, in the YAML header under `execute` add lines to disable `warning` and `message`. ```r |# message: false table(mpg$class) ``` ### Applying labels to levels of categorical variable ```{r} depress$marital <- factor(depress$marital, labels = c("Never Married", "Married", "Divorced", "Separated", "Widowed")) ``` ## Force a page break Add `\newpage` to a blank line. ---- # Data Exploration and Vizualization ## Side by side plots This uses div tags. Copy it exactly, then insert your own code chunks. ```verbatim= :::: {.columns} ::: {.column width="50%"} code chunk to create plot 1 ::: ::: {.column width="50%"} code chunk to create plot 2 ::: :::: ``` ## Expand the y axis so that your numbers can be seen Expand the y-axis to a number a little larger than your max value using `ylim(lower, upper)` ```{r} plot_frq(mpg$class) + ylim(c(0,100)) ```