## Customization
- Use the four window icon to change the layout
- Click on panes to actually see all the options
- Or use tools > global options
## Notes
- If you are stuck on the + in the console, hit escape
- To get a new file
- File > New File
- Paper + Icon
- Pick either R script or R markdown
## Markdown
- The YAML: yet another markup language
- this is the section between the `---`
- Don't delete the dashes!
- Edit below the dashes.
- Sections that are grey or just different colors
- Code chunks
- anything that looks like this:
```
{r} # code chunk
```
- Code goes into this section
- Leave the one labeled SETUP alone
- To get a new code box:
- Click the little + and C green icon in the middle of the top editor window (source window)
- Click on the R option
```
x <- 1
x = 1
```
- To run code:
- Control + enter to run the line you are on
- Click the green arrow on the chunk
- Highlight what you want > click run at the top menu, run selected lines
- Knitting:
- This option creates the output report
- It forgets completely anything you've done while working
- Order is super important!
- Notes:
- Type notes outside of chunks
- Use # to make comments
```
mean(c(3,4,5),
na.rm = T) # this is a comment
```
## Values
```
4 # numbers
TRUE # logical
FALSE # logical
"characters" # characters
NA #logical, missing data
NaN #logical, not a number
```
## Objects
- `c()`: concatenate, combine
- If you save something as an object, it *usually* doesn't print out
- If something prints, it probably didn't save
```
# variable (small vectors)
x <- 5
# vector one row of values
pizza <- c("cheese", "pepperoni", "pineapple")
pizza
# dataframe is basically excel
salad <- data.frame(
dressing = c("italian", "ranch"),
toppings = c("cheese", "croutons"),
orders = c(2, 3)
)
salad
# factors are fancy vectors
drinks <- factor(
x = c(3,4,5,3,4,3,4), # the data
levels = c(3,4,5), # what are the values possible
labels = c("Coke", "Pop", "Soda") # what label do you want to give them
)
drinks
# lists are like grocery lists
dinner <- list(
pizza,
salad,
drinks
)
dinner
```
## Functions
```
mean(c(1,2,3,NA))
mean(x = c(1,2,3,NA), na.rm = TRUE)
# x and na.rm are arguments
# mean is the function
```
Tip! Use the down arrow on the last chunk to run everything above --> then the play button to run that chunk.
## Slicing
- Better name for this is filtering, selecting, subseting, picking only certain things
- You can use `:` to indicate start THROUGH stop
- You must use `c()` combine to pick non-sequential numbers
```
# just want the first one
drinks[1]
# take the first 3
drinks[1:3]
# take some random ones
drinks[c(1,3,5)]
```
## Logicals
```
2 == 4 # are they equal!!!
2 != 4 # are they NOT equal
2 < 4 # less than
# > greater than
# >= greater than or equal to
# <= less than or equal to
```
```
# this gives you trues and falses
drinks == "Coke"
# by using slicing we can select only those
drinks[ drinks == "Coke" ]
drinks[ drinks != "Coke" ]
```
## Libraries
- Click on the bottom right window pane - the packages tab
- Install `car`, `psych`, `dplyr`
```
library(car)
library(dplyr)
```
# Training Notes
`## makes new slides`
- How many slides? 7-10
- How long to talk? try to keep it under 20 minutes
- For the assignment markdown
- Leave the libraries and loading data at the top
- about 3 exercises
# Descriptive Statistics
## Libraries
```
library(rio)
library(psych)
library(car)
library(dplyr)
```
## Data
```
# if files in the same folder as the Rmd, you just can type the name of the file
DF <- import("03_descriptives_data.csv")
head(DF)
```
NOT IN MARKDOWN but in console `View(DF)`
```
summary(DF)
```
- histograms
- x is the possible values of continuous variables
- y is the frequency of each value
```
# dataframe name dollar sign column name
# look in this dataframe here's the column name
hist(DF$accuracy)
# tidy r
DF %>% #control shift m to get the pipe
pull(accuracy) %>% # pull selects one column and turns it into a vector
hist() # now make a histogram
```
```
describe(DF)
```
- Point estimates: single values that represent data
- Variability estimates: values that represent the spread of the data
- Having both is important - havine one value that represents everything can be misleading so also having a understanding of how different people can be is useful
- `na.rm = T` allows you to exclude missing scores and then calculate your statistic
- `mean(c(1,2,3,NA), na.rm = T)`
```
mean(DF$self_perceived_knowledge, na.rm = T)
median(DF$self_perceived_knowledge, na.rm = T)
table(DF$FINRA_score)
```
```
mean(DF$self_perceived_knowledge, na.rm = T)
median(DF$self_perceived_knowledge, na.rm = T)
table(DF$FINRA_score) # mode is four
# values in the first row
# frequency counts in the second row
DF %>% # this is the pipe (ctrl shift m - command shift m)
summarize(mean_self = mean(self_perceived_knowledge, na.rm = T),
med_self = median(self_perceived_knowledge, na.rm = T))
# name of new variable = what you want to calculate
DF %>%
group_by(FINRA_score) %>% # group by allows you to calculate things by group
summarize(mode_self = n()) # n() function counts the number of rows
```
- The mean and the median are very close, the distribution or the data is probably "normal"
- When they are far apart the data is skewed
- IQR is good for skewed data
- SD is good for "normal" data
```
quantile(DF$self_perceived_knowledge)
DF %>%
pull(self_perceived_knowledge) %>% # grabs only the vector
# converts the data frame into vector
quantile(.) # the dot means use whatever comes above from the pipe
boxplot(DF$self_perceived_knowledge)
Boxplot(DF$self_perceived_knowledge)
Boxplot(DF$overclaiming_proportion)
hist(DF$self_perceived_knowledge) # visualization
mean(DF$self_perceived_knowledge) # the point estimate
sd(DF$self_perceived_knowledge) # the variability
DF %>% # this is the pipe (ctrl shift m - command shift m)
summarize(mean_self = mean(self_perceived_knowledge, na.rm = T),
med_self = median(self_perceived_knowledge, na.rm = T),
sd_self = sd(self_perceived_knowledge, na.rm = T))
```
# Probability
- in the functions:
- `d` density --> gives you probability back
- `r` random --> allows you to randomly sample